4 research outputs found

    NamedMask: Distilling Segmenters from Complementary Foundation Models

    Full text link
    The goal of this work is to segment and name regions of images without access to pixel-level labels during training. To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to image content but lacks an accessible representation of object structure. The second, DINO (Caron et al. 2021), captures the spatial extent of objects but has no knowledge of object names. Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these archives with appropriate data augmentation achieves impressive semantic segmentation abilities for both single-object and multi-object images. As a result, our proposed NamedMask performs favourably against a range of prior work on five benchmarks including the VOC2012, COCO and large-scale ImageNet-S datasets.Comment: Tech report. Code: https://github.com/NoelShin/namedmas

    ReCo: Retrieve and Co-segment for Zero-shot Transfer

    Full text link
    Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.Comment: Tech report. Code: https://github.com/NoelShin/rec

    Nighttime Reflectance Generation in the Visible Band of Satellites

    No full text
    Visible (VIS) bands, such as the 0.675 μm band in geostationary satellite remote sensing, have played an important role in monitoring and analyzing weather and climate change during the past few decades with coarse spatial and high temporal resolution. Recently, many deep learning techniques have been developed and applied in a variety of applications and research fields. In this study, we developed a deep-learning-based model to generate non-existent nighttime VIS satellite images using the Conditional Generative Adversarial Nets (CGAN) technique. For our CGAN-based model training and validation, we used the daytime image data sets of reflectance in the Communication, Ocean and Meteorological Satellite / Meteorological Imager (COMS/MI) VIS (0.675 μm) band and radiance in the longwave infrared (10.8 μm) band of the COMS/MI sensor over five years (2012 to 2017). Our results show high accuracy (bias = −2.41 and root mean square error (RMSE) = 36.85 during summer, bias = −0.21 and RMSE = 33.02 during winter) and correlation (correlation coefficient (CC) = 0.88 during summer, CC = 0.89 during winter) of values between the observed images and the CGAN-generated images for the COMS VIS band. Consequently, our CGAN-based model can be effectively used in a variety of meteorological applications, such as cloud, fog, and typhoon analyses during daytime and nighttime
    corecore